354 research outputs found

    Matching Interest Points Using Projective Invariant Concentric Circles

    Get PDF
    We present a new method to perform reliable matching between different images. This method exploits a projective invariant property between concentric circles and the corresponding projected ellipses to find complete region correspondences centered on interest points. The method matches interest points allowing for a full perspective transformation and exploiting all the available luminance information in the regions. Experiments have been conducted on many different data sets to compare our approach to SIFT local descriptors. The results show the new method offers increased robustness to partial visibility, object rotation in depth, and viewpoint angle change.Singapore-MIT Alliance (SMA

    Models for multi-view object class detection

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 99-105).Learning how to detect objects from many classes in a wide variety of viewpoints is a key goal of computer vision. Existing approaches, however, require excessive amounts of training data. Implementors need to collect numerous training images not only to cover changes in the same object's shape due to the viewpoint variation, but also to accommodate the variability in appearance among instances of the same class. We introduce the Potemkin model, which exploits the relationship between 3D objects and their 2D projections for efficient and effective learning. The Potemkin model can be constructed from a few views of an object of the target class. We use the Potemkin model to transform images of objects from one view to several other views, effectively multiplying their value for class detection. This approach can be coupled with any 2D image-based detection system. We show that automatically transformed images dramatically decrease the data requirements for multi-view object class detection. The Potemkin model also allows detection systems to reconstruct the 3D shapes of detected objects automatically from a single 2D image. This reconstruction generates realistic views of 3D models, and also provides accurate 3D information for entire objects. We demonstrate its usefulness in three applications: robot manipulation, object detection using 2.5D data, and generating 3D 'pop-up' models from photos.by Han-Pang Chiu.Ph.D

    Automatic Class-Specific 3D Reconstruction from a Single Image

    Get PDF
    Our goal is to automatically reconstruct 3D objects from a single image, by using prior 3D shape models of classes. The shape models, defined as a collection of oriented primitive shapes centered at fixed 3D positions, can be learned from a few labeled images for each class. The 3D class model can then be used to estimate the 3D shape of an object instance, including occluded parts, from a single image. We provide a quantitative evaluation of the shape estimation process on real objects and demonstrate its usefulness in three applications: robot manipulation, object detection, and generating 3D 'pop-up' models from photos

    2d z-string: A new spatial knowledge representation of image databases

    Get PDF
    Abstract The knowledge structure called the 2D C þ -string, proposed by Huang et al., to represent symbolic pictures allows a natural way to construct iconic indexes for images. According to the cutting mechanism of the 2D C þ -string, an object may be partitioned into several subparts. The number of partitioned subparts is bounded to Oðn 2 Þ, where n is the number of objects in the image. Hence, the string length is also bounded to Oðn 2 Þ. In this paper, we propose a new spatial knowledge representation called the 2D Z-string. Since there are no cuttings between objects in the 2D Z-string, the integrity of objects is preserved and the string length is bounded to OðnÞ. Finally, some experiments are conducted to compare the performance of both approaches

    SayNav: Grounding Large Language Models for Dynamic Planning to Navigation in New Environments

    Full text link
    Semantic reasoning and dynamic planning capabilities are crucial for an autonomous agent to perform complex navigation tasks in unknown environments. It requires a large amount of common-sense knowledge, that humans possess, to succeed in these tasks. We present SayNav, a new approach that leverages human knowledge from Large Language Models (LLMs) for efficient generalization to complex navigation tasks in unknown large-scale environments. SayNav uses a novel grounding mechanism, that incrementally builds a 3D scene graph of the explored environment as inputs to LLMs, for generating feasible and contextually appropriate high-level plans for navigation. The LLM-generated plan is then executed by a pre-trained low-level planner, that treats each planned step as a short-distance point-goal navigation sub-task. SayNav dynamically generates step-by-step instructions during navigation and continuously refines future steps based on newly perceived information. We evaluate SayNav on a new multi-object navigation task, that requires the agent to utilize a massive amount of human knowledge to efficiently search multiple different objects in an unknown environment. SayNav outperforms an oracle based Point-nav baseline, achieving a success rate of 95.35% (vs 56.06% for the baseline), under the ideal settings on this task, highlighting its ability to generate dynamic plans for successfully locating objects in large-scale new environments. In addition, SayNav also enables efficient generalization of learning to navigate from simulation to real novel environments

    Cross-View Visual Geo-Localization for Outdoor Augmented Reality

    Full text link
    Precise estimation of global orientation and location is critical to ensure a compelling outdoor Augmented Reality (AR) experience. We address the problem of geo-pose estimation by cross-view matching of query ground images to a geo-referenced aerial satellite image database. Recently, neural network-based methods have shown state-of-the-art performance in cross-view matching. However, most of the prior works focus only on location estimation, ignoring orientation, which cannot meet the requirements in outdoor AR applications. We propose a new transformer neural network-based model and a modified triplet ranking loss for joint location and orientation estimation. Experiments on several benchmark cross-view geo-localization datasets show that our model achieves state-of-the-art performance. Furthermore, we present an approach to extend the single image query-based geo-localization approach by utilizing temporal information from a navigation pipeline for robust continuous geo-localization. Experimentation on several large-scale real-world video sequences demonstrates that our approach enables high-precision and stable AR insertion.Comment: IEEE VR 202
    • …
    corecore